Can Day 0 Community members predict C. difficile CFU?

## Using rownames(importance_df) as id variables

Can we predict the endpoint C. difficile CFU with the community structure at Day 0?

Endpoint has data from Day 2,3,10 - might be misleading since contains CFU from different data points

Does Community structure predict C. difficile CFU of same day?

RF cfu vs community by day

## Using rownames(importance_df_day) as id variables

RF all mice/days together

Does removing the early euthanized samples improve prediction in days 1 to 3?

## Using rownames(importance_persist) as id variables

## Using rownames(importance_persist_day) as id variables

All CFU compared to same day community without mice euthanized early

Eliminating mice euthanized early from the RF model gives similar R^2 and MSE, bu there is a slight advantage to community OTU features of Day 0 to predict Day1 CFU. As well as the R^2 value increases with increasing features, whereas when all mice are used the day 1 R^2 only decreases with increasing features. Of note, accompaied by this is an increase in the % MSE attributed to OTU15 (Akkermansia), which has seemed to stand out in all other days/analysis.Interestingly, akkermansia does not appear to have the same relationship when compared to the same day cfu and community. This could suggest akkermansia is promoting the intial colonization of c difficile. At first glance, OTU135 (Coriobacteriaceae) seems to stand out for predicting cfu of the same day as well as from day 0.


Can we predict moribundity with the community structure at Day 0?

randomForest

Confusion Matrix

0 1 class.error
0 13 3 0.187500
1 1 38 0.025641

OOB error rate = 7.2727273

Mice which were euthanized early, but the model predicts the will were not (Mouse Tag - Cage): 188 LINE, 2096 OUT, 2542 OP, 389 578

AUCRF

## 
## Call:
## roc.formula(formula = Predict_early_euth_df$Euth_Early ~ otu_euth_probs[,     2])
## 
## Data: otu_euth_probs[, 2] in 16 controls (Predict_early_euth_df$Euth_Early 0) < 39 cases (Predict_early_euth_df$Euth_Early 1).
## Area under the curve: 0.9904
## 
## Call:
## roc.formula(formula = cv10f_all_resp ~ cv10f_all_pred)
## 
## Data: cv10f_all_pred in 1600 controls (cv10f_all_resp 1) < 3900 cases (cv10f_all_resp 2).
## Area under the curve: 0.9839
## 
## Call:
## roc.formula(formula = aucrf_data$Euth_Early ~ LOO_probs)
## 
## Data: LOO_probs in 16 controls (aucrf_data$Euth_Early 0) < 39 cases (aucrf_data$Euth_Early 1).
## Area under the curve: 0.992

## null device 
##           1